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INTRODUCTION 

Lattice QCD has witnessed a dramatic progress during 
the last decade. Whereas simulations ten years ago where 
still carried out either in the quenched approximation or 
at sea quark masses far above the physical values, we 
can now calculate with much lighter quarks with masses 
down to the physical values of the up and down quark. 
Only part of this progress is due to increased computer 
resources. The larger part comes from improved algo- 
rithms, in particular the way the fermion determinant, re- 
sponsible for the effects of the sea quarks, is included in 
the simulations. The ideas behind this progress will be 
reviewed in the first part of this contribution. 

By their nature, lattice calculations are never done 
"at the physical point", i.e. in continuum space-time, 
infinite volume and with six dynamical quark flavors at 
precisely the physical masses, ideally taking the effect 
of the full standard model into account. In particular, a 
lattice simulation will always be at a finite lattice spacing 
and the result at several fine lattice spacings a has then to 
be extrapolated to a = 0. The systematic error associated 
with this extrapolation depends on the ability to simulate 
at lattice spacings which are much smaller than the scales 
involved in the problem. For the physics of light quarks 
Aqcd is the relevant scale. For relativistic heavy quarks, 
however, additionally the scale set by the quark's mass 
plays an important role and the lattice spacing a has to 
be sufficiently small, i.e. am q <C 1. For the charm quark, 
e.g., one therefore needs lattice spacings well below 
0.05fm for precision results. 

In the generation of the ensembles fine enough to con- 
trol the continuum extrapolation, however, a grave prob- 
lem occurs: some modes in the simulation move increas- 
ingly slowly, in particular the phenomenon is visible in 
the topological charge of the gauge configuration. The 



simulation rarely tunnels between topological sectors. 

Some critical slowing down is expected in any sim- 
ulation as critical points are approached, here the con- 
tinuum limit. A typical rate of this increase is with the 
second power of the correlation length. In gauge theory, 
however, we find a critical exponent of z ~ 5 instead 
of two, making simulations of fine lattices practically 
impossible. The details of these statements are covered 
in the second half of this writeup. The writeup finishes 
with an overview of the current state of the lattice sim- 
ulations with Wilson fermions. The status of staggered 
quark simulations has been covered by Gottlieb at this 
conferencefl]. 

DYNAMICAL FERMIONS 

Because of their anti-commuting nature, there is no nat- 
ural way to treat fermions in numerical simulations. The 
textbook version of the path integral for QCD is unfor- 
tunately not suitable for numerical simulations since it 
contains integrals over Grassmann variables. They can 
be performed analytically, which introduces the determi- 
nant of the Dirac operator, with the remaining integrals 
over the gauge degrees of freedom U 

N f 

Z = / [dU] l\det[D(m f )} exp(-S g [U}) (1) 
J f 

with So the gauge action and D(mA the lattice version 
of the Dirac operator with quark mass rrif. Since D is 
a large matrix, computing its determinant is virtually 
impossible and it is replaced by a path integral over 
bosonic "pseudo-fermion" fields 

detD 2 (m/) « / exp(-0t-^) , (2) 



which works for even numbers of degenerate flavors, 
because the matrix in the exponent has to be Hermi- 
tian positive definite. Although applying this identity 
seems innocent, using it in a straight forward manner 
simulating the resulting path integral (containing gauge 
and pseudo-fermion fields) with the Hybrid Monte Carlo 
(HMC) algorithm[2] turns out to be unfeasible for light 
quarks and fine lattices. This was famously summarized 
by Ukawa at the Lattice conference in 2001 [3], where he 
gave as the cost of generating a decent sized ensemble 
for two-flavors of dynamical Wilson quarks as 
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with C = 2.8 Tflops years. At the physical value of the 
pion mass, where m K /m p ps 0.17, this would be impos- 
sible even with today's machines. Just physically light 
quarks, for which 3fm are too small, would require a 
peta-fiops machine for several years despite the coarse 
lattice spacing of O.lfm. 

There are two, related, basic insights which lead to the 
progress of the last decade. The first was that the estimate 
of the determinant, which is provided by one realization 
of the pseudo-fermion field 0, is not good enough. Better 
estimators have to be used, but in a way, which is mean- 
ingful from the physics point of view and which also can 
be introduced into lattice QCD algorithms. 

The second insight, which often also provides a solu- 
tion to the first, is that the ultra-violet and the infra-red 
physics of the theory are different and also need to be 
treated differently. By separating the two, one can deal 
with them according to their requirements. But the suc- 
cessful methods to achieve this splitting also provide im- 
proved estimators of the fermion determinant. 

The initial break-through into this direction is by 
Hasenbusch with his mass preconditioning^, 5], where 
first, the fermion matrix is split into two parts 

detD(m / ) = det£>(M) det \p(nif)D~ l {M)\ (4) 

with M a mass larger than m/. The first term is therefore 
dominated by the UV physics, whereas the second part 
is largely infrared. On each of the two terms, Eq. 2 
is applied and the resulting partition function can be 
simulated with the standard HMC algorithm, yielding a 
dramatic speed-up, for the right choice of M, of course. 

Another successful split-up of the determinant is do- 
main decomposition (giving the DD-HMC its name), 
which was introduced by Liischer in Ref. [6], see Fig. 1 
for an illustration. Here, the lattice is decomposed into 
blocks and the Dirac operator is split into one living only 
inside the blocks and a correction term, which accounts 
for the rest 

detD(m/) = |~[ det Dbiock (m/)- detR 

blocks 



FIGURE 1. In the DD-HMC algorithm, the lattice is split 
into blocks, thereby separating the UV from the IR part of the 
fermion action. 



This provides an obvious geometric separation in ultra- 
violet and infra-red part with the corresponding speed-up 
in the algorithm. 

And finally, the third way of splitting the determinant 
leading to the RHMC[7] is to use 



det£> = PJdet£> 



l/JV 



i=\ 

The separation between IR and UV is less clear, however, 
it has the advantage that the fermion flavors do not have 
to come in even numbers of degenerate quarks. It is 
therefore the method of choice for simulating the strange 
and charm quark, using the identity detD = det VD^D. 

The algorithms profit from the separation of the UV 
and the IR because these two contributions evolve on 
very different time scales. The gauge field has much 
faster fluctuations than most of the UV of the fermions, 
which in turn evolve much faster than the infra-red. This 
makes the use of specialized methods possible [8, 9, 10], 
which move the modes on separate time scales. 

For example, just the block decomposition of the DD- 
HMC algorithm leads to a cost formula of 
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with C = 0.5 Tflops years and m the running MS sea 
quark mass at 2GeV [11]. To compare with Eq. 3, we 
note that to leading order in the chiral expansion <x m. 
Thus the exponent governing the cost of going chiral, 
which was 6 ten years ago, has dropped to 2. Equally im- 
portant, the overall normalization is reduced by a factor 
of 100. With mass preconditioning, similar performance 
can be reached[12]. 

To summarize, of the three algorithms, the RHMC 
follows the most the idea of an improved estimator of 
the fermion determinant. It provides a splitting of the 
determinant into equal parts, reducing the fluctuations 



introduced by the pseudo-fermions. The separation of IR 
and UV, however, is not as obvious, but both profit from 
the improved estimation. 

At the other end, the block decomposition of DD- 
HMC is clearly designed for the separation of the UV 
from the IR. Since the action is split into the two parts, 
whose relative size can be tuned by the size of the blocks, 
also a better estimator is provided. 

The mass preconditioning takes an intermediate po- 
sition. The IR/UV separation is softer, since in both fac- 
tors of Eq. 4 contain both parts of the spectrum, however, 
with different weights. But it also provides a clear handle 
on getting a better estimator for the fermion determinant. 
The identity of Eq. 4 can be iterated and with a larger 
number of suitably chosen masses M,-, a systematically 
better estimate can be reached. 



Solver 

All evidence speaks very much in favor of the effi- 
ciency of the simple observation that the infra-red is dif- 
ferent from the ultra-violet part of the physics and that 
respecting this physics can lead to this very beneficial ef- 
fects. The art is to bring it in a feasible and cost-efficient 
way to the numerical simulations. 

It has long been known that the infra-red part of the 
Dirac operator is responsible for the high cost of solv- 
ing the Dirac equation, which is the most expensive 
part of lattice simulations. In some special applications, 
therefore, the low eigenvectors are removed, leading to 
a dramatic speed-up, however, at the cost of computing 
these eigenvectors, whose number grows with the vol- 
ume. Liischer realized the dominant contribution to this 
space can be constructed from localized modes[13] in a 
very cheap way. Basically, very few approximate eigen- 
vectors of the Dirac operator are spatially cut apart and 
then recombined in all possible ways. This gives then the 
major part of the low eigenspace up to some physical 
energy. The achievement of this method is that its cost 
just grows linearly with the volume, instead of previous 
methods, which had at least a V 2 scaling. 

For the solution of the Dirac equation, removing the 
space constructed from these local modes virtually elim- 
inates all increase in the cost of this operation as the 
quarks get lighter, i.e. the factor of m~ x in Eq. 5. At mod- 
erately light quark masses, gains of a factor of 10 have 
been observed. These savings have been demonstrated in 
the original DD-HMC setup[14] as well as in mass pre- 
conditioned HMC[12]. A very similar idea is the adap- 
tive multi-grid[15]. 

It remains to be noted, that the idea of treating the 
infra-red part of the Dirac operator's spectrum special 
has not stopped here. Also for observables, computing 



the contribution from the infra-red more precisely than 
the UV part has proven to be very beneficial[16, 17]. 

APPROACHING THE CONTINUUM 

The advances in the fermionic sectors have led to sig- 
nificant optimism and large ensembles of gauge config- 
urations at different sea quark masses and lattice spac- 
ings have been produced. However, going to finer lat- 
tices, a severe slowing down of the simulations has been 
observed[18, 19]. This is not new phenomenon and had 
been observed previously in particular in pure gauge the- 
ory, however, with dynamical fermions the problem is 
slightly less severe. 



Background 

In Markov Chain Monte Carlo simulations, an algo- 
rithm is a probabilistic procedure to generate a sequence 
of field configurations E/,- 

Ui -S- U 2 -> U 3 -> >U N 

given by a transition probability T(U' <—U). Under cer- 
tain conditions, in particular stability 

P(U)='£P(U')T(U ^U') 
u> 

the Ui are then distributed according to a given probabil- 
ity distribution P. Because of this process, the probability 
distribution of C/ (+ i depends on [/,-, which leads to corre- 
lations among subsequent measurements of observables 
At =A(Uj). These are described by the auto-correlation 
function 

r A (t) = {(A i -{A))(A i+t -{A))) 

and in an even more concise way by the integral, the 
auto-correlation time 

The error Oa for an estimate from N subsequent mea- 
surements is then given by 

a A = V var ( A ) 

This is the ordinary error formula, which effectively dif- 
fers from the one without correlation just by the reduc- 
tion of the number of measurements N by a factor of 

2 1i n t. 



Critical slowing down 

The cost of a simulation is thus proportional to Tj nt and 
in the continuum limit — approaching a continuous phase 
transition — one expects it to grow with a power-law 

Ti llt « 0T Z 

with a the lattice spacing and z the dynamical critical ex- 
ponent. The whole phenomenon is called critical slow- 
ing down and can be viewed in analogy to static critical 
phenomena and their scaling laws. However, it has to be 
stressed that the value of z does not only depend on the 
properties of the underlying theory, but also on the algo- 
rithm with which it is simulated. For some spin models, 
algorithms with z ~ have been found, like cluster or 
multi-grid algorithms, however, for Yang-Mills theory or 
QCD, such an algorithm does not exist. For a generic 
small step algorithm, one expects z ~ 2, based on the 
idea that information is distributed in a random walk and 
needs to spread over a correlation length (here propor- 
tional to a ) to give an independent measurement. The 
cost formulae in Eqs. 3 and 5 seem to assume z ~ 1. 

The cost of an independent measurement is propor- 
tional to the cost of a single update times the number of 
updates needed. A single update will most certainly have 
a cost proportional to a~ 4 for fixed volume in four dimen- 
sions. For Hybrid Monte Carlo, cT 5 is a typical behavior. 
Combined with the effect of critical slowing down, this 
gives a~ 5 ~ z , which can give a very sizeable effect for a 
large z- 



Hybrid Monte Carlo 

As described above, virtually all current simulations 
are using a variant of the HMC algorithm. It is there- 
fore pivotal to know the behavior of their cost when the 
continuum limit is approached. Although this cost will 
depend on the observable in question, a safe simulation 
needs to be in a situation where all observables decorre- 
late much faster than the full statistics. This is necessary, 
because the modes, which are being moved by the tran- 
sition matrix T, couple to all observables — barring some 
symmetry explicitly prohibiting the coupling. The situa- 
tion needs to be such that one is able to detect from the 
simulation itself the coupling of slowly moving modes to 
the observables in question, which requires that we see 
sufficient movement in all possible quantities. 

In order to determine the dynamical critical exponents 
of lattice simulations, we performed a pure gauge study 
with the Wilson gauge action using the same algorithms 
as used for QCD simulations, mainly DD-HMC, but we 
also tested the behavior of pure HMC[19, 20]. The main 
result is displayed in Fig. 2. It shows how the integrated 
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FIGURE 2. Integrated auto-correlation time vs. the lattice 
spacing. Q 2 is the topological charge squared, whose behavior 
is compatible with z ~ 5, W is the square Wilson loop which 
for which z ~ 1 . 



auto-correlation time rises as the continuum limit is ap- 
proached. This behavior is different for different observ- 
ables. We display the topological charge, for which a 
very steep rise compatible with a dynamical critical ex- 
ponent of z«5. However, an exponential behavior — 
for which evidence is presented in [21] — cannot be ex- 
cluded. The picture is very different for the (smeared) 
square Wilson loop of size 0.5fm, which we show be- 
cause it turns out to be the loop with the slowest evo- 
lution. The rise is compatible with z ~ 1, actually less 
severe than a simple random walk picture suggests. This 
is already an indication that a decoupling between the 
slow modes governing the topological charge and some 
other observables occurs. The problem is that in princi- 
ple, one has to check observable by observable, whether 
the situation is under control, or whether the slow modes 
contribute significantly. 

In Fig. 3, we make a comparison between the auto- 
correlation behavior of pure gauge theory and of dy- 
namical two-flavor simulations with Wilson quarks at 
roughly the same lattice spacing. It is striking that the 
auto-correlation functions are virtually identical between 
the two set-ups for all observables at which we looked, 
including meson masses, quark masses, decay constants 
and the plaquette. Only in the topological charge a sig- 
nificant difference appears. The same phenomenon can 
also be observed in pure gauge theory, when moving to a 
different gauge action. We tested it for the Iwasaki gauge 
action and also found the only difference compared to 
the Wilson gauge action in the auto-correlations of the 
topological modes. 

The slowing down of the topological charge does not 
come as a complete surprise. It has long been part of 
the folklore of the field that in the continuum there is a 
separation between the topological charge sectors. From 
the point of view of the lattice, however, the emergence 
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FIGURE 3. Comparison between the normalized auto-correlation function p(t = F(t) /r(0) for pure gauge and dynamical Nf = 2 
simulations. On the left, we show the smeared plaquette, then in the center the quark mass corresponding to the strange quark, on 
the right the square of the topological charge. 



of these sectors has long been unclear. A typical gauge 
configuration is very rough and contains a large number 
of small objects (dislocations) which carry topological 
charge. This makes a separation in charge sectors of the 
connected field space seem arbitrary. 

Recently, Liischer[22] proposed a new definition of 
the charge using the gradient flow on the field space, 
smoothing the fields in a well defined way. He could 
show that the configurations, to which a certain charge 
cannot be attributed unambiguously, die out very quickly 
as the lattice spacing is reduced with a power of roughly 
cT 6 . It is clear, that a small step algorithm has prob- 
lem with this kind of situation: since the algorithm is 
supposed to do importance sampling, it will stay during 
the evolution in the region of important configurations; 
steps bridging a less important region are by construc- 
tion not included. If the configurations between sectors 
are rapidly becoming less important, the transition from 
one topological sector to the other will obviously be sup- 
pressed accordingly. 

STATUS OF WILSON FERMION 
SIMULATIONS 

The ability to simulate QCD at light fermion masses and 
small lattice spacings has put many collaborations into 
the position to generate ensembles on which interest- 
ing physics can be studied. An overview of the status 
of the simulations with Wilson fermions can be found 
in Tab. 1. All collaborations use formulations, in which 
leading lattice effects are removed. CLS[23, 24] and 
PACS-CS[25] both use non-perturbative improvement, 
ETMC [26, 27] uses twisted mass fermions at maximal 
twist which profit from automatic G(a) improvement. 
The BMW collaboration[28] uses tree-level improved 
Wilson fermions which couple through "HEX" smeared 
links to the gauge field. The QCDSF and UKQCD col- 
laborations also use gauge field smoothing in their non- 
perturbatively improved SLiNC fermions [29]. All col- 



laborations have produced data sets with light sea quarks. 
The problems described in the previous section limit cur- 
rent studies to about 0.05fm, and even there, worries 
about slow modes are in place. There are two major ob- 
stacles for even lighter fermions these days: finite volume 
effects and instabilities or possible lattice phase transi- 
tions. 

The effects of finite volume can easily be kept small 
by a lattice size much larger than the pion wave length 
L ^> m~ 1 . However, larger L still comes with the fifth 
power in all cost formulae. Cutting the pion mass by 
half therefore requires roughly a factor of 30 in computer 
time, just from this criterion alone. 

The second obstacle are instabilities or even phase 
transitions as the quark mass is lowered. Since this write- 
up is about Wilson quarks, which explicitly break chi- 
ral symmetry, the latter is a real possibility (e.g. the 
Aoki phase discussed in Ref. [30]). Instabilities in the 
simulations occurring at small quark mass have been 
discussed[31], which come from the fact that the spec- 
trum of the Wilson operator, because of the lack of chiral 
symmetry, is not bounded from below by the quark mass. 

Therefore, all types of Wilson fermions can only reach 
a finite minimal quark mass for a given lattice spacing. 
The finer the lattice, the smaller this minimal mass. How 
fine a lattice one needs to simulate the light quarks at 
their physical parameters depends on the action. The 
way this problem is mitigated in current set-ups used by 
the PACS-CS[25] or the BMW[28] collaboration is to 
use a special gauge action or smeared fermion action, 
respectively, which suppress the dislocations, thought 
to be the cause of the problem. These collaborations 
therefore could report simulations at the physical value 
of the light quark masses. 

SUMMARY 

Lattice computations have gone a long way. Light quarks 
are light and their effect is taken into account in the 



TABLE 1. Status of current Wilson fermion simulations: Nf is the number 
of sea quark flavors, a the lattice spacing and m K the minimal sea pion mass. 
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270. . . 


PACS-CS 


NP imp. Wilson 
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0.09 
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vacuum. However, they are still expensive. In particular, 
decreasing the lattice spacing comes with a significant 
critical slowing down of observables like the topologi- 
cal charge. The measured z ~ 5 means that the total cost 
of simulations rises like a" 10 . This requires either very 
sizable computing resources or a better algorithm which 
moves these fields more efficiently. Such an algorithm 
has not been found yet. Certainly, the phenomenon casts 
doubt on whether the statistical errors in current simula- 
tions are truly under control and, with it, the extrapola- 
tion to the continuum limit which is so crucial for final 
answers. 

Given the decoupling between the modes which are 
prominent in the topological charge and responsible for 
its slow evolution and, e.g. the Wilson loop apparent 
in Fig. 2, one might wonder whether it matters at all. 
In particular in a large volume, the global topological 
charge should have very little effect on any reasonably 
local observable. However, from a general Monte Carlo 
perspective, we can determine the expectation values 
observables, their fluctuations and the associated auto- 
correlations only from the simulation itself. Since we 
now know of auto-correlations of the size of a typical 
total statistics, we have to worry about even longer, un- 
detected ones. To gain confidence in any Markov Chain 
Monte Carlo, it is necessary that all correlations are much 
smaller than the total length of the chain. To believe that 
only the topological charge is affected by the increase in 
auto-correlations just seems naive. 

In the end, for the continuum extrapolation, the lattice 
simulations are at a similar point now as they were ten 
years ago for the chiral extrapolation. Back then, sim- 
ulating pions with less then 500MeV seemed impossi- 
ble, with costs exploding with the sixth power. Now we 
are facing a similar exponent for the increasing auto- 
correlations towards the continuum. But the solution of 
the problem of the chiral limit gives hope that also the 
continuum extrapolation will find a solution in the not 
too distant future. 
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